I have set echo=FALSE so that most of the code chunks will not display. Please refer to the .Rmd file for source code.

1 Loading packages and importing data

1.1 Packages

Details of packages, etc, are given below.

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] magick_2.3      patchwork_1.0.0 ggthemes_4.2.0  see_0.2.1      
##  [5] latex2exp_0.4.0 ggridges_0.5.1  tidybayes_2.0.2 brms_2.12.0    
##  [9] Rcpp_1.0.3      forcats_0.4.0   stringr_1.4.0   dplyr_0.8.3    
## [13] purrr_0.3.3     readr_1.3.1     tidyr_1.0.2     tibble_2.1.3   
## [17] ggplot2_3.2.1   tidyverse_1.2.1 bookdown_0.18  
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1          rsconnect_0.8.15         
##  [3] markdown_1.1              base64enc_0.1-3          
##  [5] rstudioapi_0.10           rstan_2.19.2             
##  [7] svUnit_0.7-12             DT_0.8                   
##  [9] fansi_0.4.1               lubridate_1.7.4          
## [11] xml2_1.2.2                bridgesampling_0.7-2     
## [13] knitr_1.25                shinythemes_1.1.2        
## [15] bayesplot_1.7.0           jsonlite_1.6             
## [17] broom_0.5.2               shiny_1.3.2              
## [19] compiler_3.6.1            httr_1.4.1               
## [21] backports_1.1.5           assertthat_0.2.1         
## [23] Matrix_1.2-17             lazyeval_0.2.2           
## [25] cli_2.0.1                 later_0.8.0              
## [27] htmltools_0.3.6           prettyunits_1.1.1        
## [29] tools_3.6.1               igraph_1.2.4.1           
## [31] coda_0.19-3               gtable_0.3.0             
## [33] glue_1.3.1                reshape2_1.4.3           
## [35] cellranger_1.1.0          vctrs_0.2.2              
## [37] nlme_3.1-140              crosstalk_1.0.0          
## [39] insight_0.5.0             xfun_0.8                 
## [41] ps_1.3.0                  rvest_0.3.4              
## [43] mime_0.7                  miniUI_0.1.1.1           
## [45] lifecycle_0.1.0           gtools_3.8.1             
## [47] zoo_1.8-6                 scales_1.1.0             
## [49] colourpicker_1.0          hms_0.5.0                
## [51] promises_1.0.1            Brobdingnag_1.2-6        
## [53] parallel_3.6.1            inline_0.3.15            
## [55] shinystan_2.5.0           yaml_2.2.0               
## [57] gridExtra_2.3             loo_2.2.0                
## [59] StanHeaders_2.19.0        stringi_1.4.5            
## [61] bayestestR_0.3.0          dygraphs_1.1.1.6         
## [63] pkgbuild_1.0.6            rlang_0.4.4              
## [65] pkgconfig_2.0.3           matrixStats_0.55.0       
## [67] evaluate_0.14             lattice_0.20-38          
## [69] rstantools_2.0.0          htmlwidgets_1.3          
## [71] tidyselect_0.2.5          processx_3.4.1           
## [73] plyr_1.8.5                magrittr_1.5             
## [75] R6_2.4.1                  generics_0.0.2           
## [77] pillar_1.4.3              haven_2.1.1              
## [79] withr_2.1.2               xts_0.11-2               
## [81] abind_1.4-5               modelr_0.1.5             
## [83] crayon_1.3.4              arrayhelpers_1.0-20160527
## [85] rmarkdown_2.1             grid_3.6.1               
## [87] readxl_1.3.1              callr_3.4.1              
## [89] threejs_0.3.1             digest_0.6.23            
## [91] xtable_1.8-4              httpuv_1.5.1             
## [93] stats4_3.6.1              munsell_0.5.0            
## [95] shinyjs_1.0

1.2 Import Data

Here we import our data and make some summary plots.

1.2.1 EEG Data

Importing the primary EEG data set. This is odd-harmonic filtered data from region-of-interest consisting of six electrodes over occipital cortex.

RMS data from two of the wallpaper groups, P2 and P4M. Odd harmonics are shown in A and B, while even harmonic data are shown in C and D, and occipital and parietal regions of interest are shown in dark nad light gray, respectively. The two groups elicit very different response amplitudes for odd harmonics over occipital cortex, but for even harmonics those differences are much less pronounced.

Figure 1.1: RMS data from two of the wallpaper groups, P2 and P4M. Odd harmonics are shown in A and B, while even harmonic data are shown in C and D, and occipital and parietal regions of interest are shown in dark nad light gray, respectively. The two groups elicit very different response amplitudes for odd harmonics over occipital cortex, but for even harmonics those differences are much less pronounced.

## Observations: 400
## Variables: 3
## $ wg      <chr> "P2", "PM", "PG", "CM", "PMM", "PMG", "PGG", "CMM", "P4"…
## $ subject <chr> "s01", "s01", "s01", "s01", "s01", "s01", "s01", "s01", …
## $ rms     <dbl> 0.4013, 0.6555, 0.5547, 0.7635, 0.9185, 0.7285, 0.4320, …

It’s clearly skewed, and negative display duration are impossible, so will fit a glm with family = 'lognormal'.

Data are the root-mean-squaured (rms) over the odd-harmonic filtered waveforms.

Figure 1.2: Data are the root-mean-squaured (rms) over the odd-harmonic filtered waveforms.

1.2.2 Threshold Data

Here we import that data and select the columns that we’re interested in. Threshold gives the required display duration (in seconds) for the two stimuli to allow for accurate discrimination.

## Observations: 186
## Variables: 3
## $ subject   <chr> "person10", "person10", "person10", "person10", "perso…
## $ wg        <chr> "CM", "CMM", "P2", "P3", "P31M", "P3M1", "P4", "P4G", …
## $ threshold <dbl> 0.74125, 0.20216, 0.47697, 0.35012, 0.24529, 0.19022, …

As above, a summary of the data.

Histogram of the display duration thresholds.

Figure 1.3: Histogram of the display duration thresholds.

Again, we have a skewed distribution, so will fit with family = 'lognormal'.

1.2.3 Control Data

In addition to the primary EEG data set, we are also importing two control data sets which are (a) even harmonic data from the same occipital electrodes, and (b) odd harmonic data from six parietal electrodes (see Figure 1.1 and the main paper).

2 Bayesian Analysis

Here are the details of the Bayesian multi-level modelling. Our general approach is:

2.1 Define Priors

In this section we will specify some priors. We then then use a prior-predictive check to assess whether the prior is reasonable or not (i.e., on the same order of magnitude as our measurements).

2.1.1 Fixed Effects

Our independent variable is a categorical factor with 16 levels. We will drop the intercept from our model and instead fit a coefficent for each factor level (\(y \sim x - 0\)). As our dependant variable will be log-transformed, we can use the priors below:

prior <- c(
  set_prior("normal(0,2)", class = "b"),    
  set_prior("cauchy(0,2)", class = "sigma"))

2.1.2 Group-level Effects

We will keep the default weakly informative priors for the group-level (‘random’) effects. From the brms documentation:

[…] restricted to be non-negative and, by default, have a half student-t prior with 3 degrees of freedom and a scale parameter that depends on the standard deviation of the response after applying the link function. Minimally, the scale parameter is 10. This prior is used (a) to be only very weakly informative in order to influence results as few as possible, while (b) providing at least some regularization to considerably improve convergence and sampling efficiency.

2.1.3 Prior Predictive Check

Now we can specify our Bayesian multi-level model and priors. Note that as we are using sample_prior = 'only', the model will not learn anything from our data.

m_prior <- brm(data = d_eeg, 
  rms ~ wg-1 + (1|subject),
  family = "lognormal", 
  prior = prior, 
  iter = n_iter ,
  sample_prior = 'only')

We can use this model to generate data.

## Observations: 320,000
## Variables: 2
## $ key   <chr> "P2", "P2", "P2", "P2", "P2", "P2", "P2", "P2", "P2", "P2"…
## $ value <dbl> 1.98845697, 9.26489539, 9.71836157, 0.07553109, 0.27777647…
The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.1: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

We can see that i) our priors are relatively weak as the predictions span several orders of magnitide, and ii) our empirical data falls within this range.

2.2 Compute Posterior

2.2.1 Fit Model to EEG Data

We will now fit the model to the data.

##  Family: lognormal 
##   Links: mu = identity; sigma = identity 
## Formula: rms ~ wg - 1 + (1 | subject) 
##    Data: d_eeg (Number of observations: 400) 
## Samples: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup samples = 20000
## 
## Group-Level Effects: 
## ~subject (Number of levels: 25) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.38      0.06     0.28     0.51 1.01     1426     1756
## 
## Population-Level Effects: 
##        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## wgCM      -0.48      0.09    -0.66    -0.31 1.00      869     1662
## wgCMM     -0.12      0.09    -0.30     0.06 1.00      928     1848
## wgP2      -0.95      0.09    -1.12    -0.77 1.00      916     1827
## wgP3      -0.82      0.09    -0.99    -0.64 1.00      928     1695
## wgP31M    -0.43      0.09    -0.61    -0.25 1.00      893     1772
## wgP3M1    -0.14      0.09    -0.31     0.04 1.00      870     1695
## wgP4      -0.48      0.09    -0.66    -0.31 1.00      921     1796
## wgP4G     -0.25      0.09    -0.43    -0.08 1.00      911     1812
## wgP4M      0.15      0.09    -0.02     0.33 1.00      894     1818
## wgP6      -0.48      0.09    -0.66    -0.30 1.00      891     1783
## wgP6M     -0.04      0.09    -0.22     0.13 1.00      866     1708
## wgPG      -0.93      0.09    -1.11    -0.76 1.00      873     1713
## wgPGG     -0.72      0.09    -0.90    -0.55 1.00      886     1660
## wgPM      -0.59      0.09    -0.77    -0.42 1.00      904     1824
## wgPMG     -0.26      0.09    -0.43    -0.08 1.00      910     1813
## wgPMM     -0.07      0.09    -0.24     0.11 1.00      907     1734
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.23      0.01     0.21     0.25 1.00     7758    10394
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We will now look at the model’s predicts for the average participant (i.e, ignoring the random intercepts).

The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.2: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

2.2.2 Fit Model to Psychophysical Data

We will now fit the model to the data.

##  Family: lognormal 
##   Links: mu = identity; sigma = identity 
## Formula: threshold ~ wg - 1 + (1 | subject) 
##    Data: d_dispthresh (Number of observations: 186) 
## Samples: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
##          total post-warmup samples = 20000
## 
## Group-Level Effects: 
## ~subject (Number of levels: 12) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.39      0.11     0.24     0.66 1.00     2870     4657
## 
## Population-Level Effects: 
##        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## wgCM      -0.22      0.16    -0.53     0.11 1.00     2403     4638
## wgCMM     -1.15      0.16    -1.47    -0.82 1.00     2306     4800
## wgP2      -0.29      0.17    -0.62     0.06 1.00     2390     4846
## wgP3      -0.69      0.16    -1.00    -0.36 1.00     2419     4513
## wgP31M    -1.18      0.16    -1.49    -0.85 1.00     2299     4123
## wgP3M1    -1.34      0.16    -1.66    -1.02 1.00     2360     4265
## wgP4      -0.89      0.16    -1.21    -0.56 1.00     2458     4682
## wgP4G     -1.22      0.17    -1.54    -0.88 1.00     2417     4194
## wgP4M     -1.29      0.16    -1.60    -0.96 1.00     2305     4474
## wgP6      -1.20      0.17    -1.52    -0.86 1.00     2489     4469
## wgP6M     -1.41      0.17    -1.74    -1.07 1.00     2248     5050
## wgPG       0.36      0.17     0.04     0.71 1.00     2305     4597
## wgPGG     -0.31      0.17    -0.62     0.02 1.00     2327     4833
## wgPM      -0.79      0.17    -1.11    -0.44 1.00     2457     4343
## wgPMG     -0.96      0.16    -1.28    -0.64 1.00     2330     4618
## wgPMM     -1.17      0.16    -1.49    -0.84 1.00     2355     4413
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.42      0.02     0.37     0.46 1.00    10723    13316
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

2.2.3 EEG Control Data

We will also fit models to the control data. As we can see from Figure 2.4, the group differences are much smaller.

## Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
## Running the chains for more iterations may help. See
## http://mc-stan.org/misc/warnings.html#bulk-ess
The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

Figure 2.3: The density plot shows the distribution of the empirical data, while the blue line shows the 66% and 95% prediction intervals.

3 Subgroup Comparisons

We will now compute the difference between sub- and super-groups.

3.1 Primary EEG Data

Finally, we calculate the probability that the RMS difference between subgroup and supergroup is larger than zero given the data. This information is then binned so we can colour in the posterior density plots.

Now we will use it!

Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Figure 3.1: Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

3.2 Psychophysical Data

We can do the same for the display duration thresholds from our psychophysics experiment. Here we are looking for the opposite effect, namely that display larger are larger for subgroups than for supergroups (see main paper), so we calculate the probability that differences in duration are smaller than zero.

Distributions of the difference in mean log display duration threshold between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being less than zero.

Figure 3.2: Distributions of the difference in mean log display duration threshold between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being less than zero.

3.3 Control EEG Data

We will now do exactly the same with the control data (odd harmonic data from parietal electrodes and even harmonic data from occipital electrodes)

Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Figure 3.3: Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

Figure 3.4: Distributions of the difference in mean log(rms) between sub- and super-groups. The index of each relationship is indicated by the colour of the y-axis label. The fill of the density plots indicated the probability of the difference being greater than zero.

3.4 Summary

We can summarise the subgroup comparison plots above by plotting ROC curves for each of our four measurements (Figure 3.5).

This figure shows how many of our 64 comparisons are classed as having a greater-than-zero difference (less-than-zero for the display durations) for difference thresholds. between 0.5 an 1.0.

Figure 3.5: This figure shows how many of our 64 comparisons are classed as having a greater-than-zero difference (less-than-zero for the display durations) for difference thresholds. between 0.5 an 1.0.

If we take p=0.95 as our cut-off, we can see that the subgroup relations are preserved in 56/64 = 87.5% and 49/64 = 76.6% of the comparisons for the primary EEG and display durations repectively. This compares to the $32/64= 50.0% and 22/64 = 34.4% for the control EEG conditions.

4 Additional Analysis

4.1 Index and Normality

Subgroup relations can be classified by their index, and by whether they are normal or not. Here we investigate the extent to which these two variables can account for the variation between the subgroup relationships.

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: mean_value ~ index * normal 
##    Data: comp_summary$eeg (Number of observations: 63) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept        0.12      0.11    -0.09     0.32 1.00     2653     2731
## index            0.03      0.02    -0.00     0.07 1.00     2626     2997
## normal          -0.01      0.13    -0.27     0.25 1.00     2018     2116
## index:normal     0.05      0.03    -0.02     0.11 1.00     2466     2539
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.19      0.02     0.16     0.23 1.00     3476     2668
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: mean_value ~ index * normal 
##    Data: comp_summary$threshold (Number of observations: 63) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept       -0.27      0.19    -0.64     0.09 1.00     2546     2543
## index           -0.04      0.03    -0.10     0.02 1.00     2524     2282
## normal           0.16      0.24    -0.30     0.65 1.00     2147     2313
## index:normal    -0.05      0.06    -0.18     0.07 1.00     2456     2570
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.35      0.03     0.29     0.42 1.00     2501     2380
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We can see that the index of the subgroup relationship has an effect on both the difference in log(rms) and the difference in log(display duration): relationships with a higher index lead to larger differences.

The effect of index and normality on log(rms) and log(ms).

Figure 4.1: The effect of index and normality on log(rms) and log(ms).

4.2 Correlation Between Primary EEG data and Psychophysical Thresholds

Finally, we will investigate whether there is a correlation between the our primary EEG measure (rms amplitude of odd harmonics over occipital cortex) and our display duration thresholds. As our two different measures come from different samples of participants, we are unable to do a direct comparison. However, we can use the results of the models discussed in Section 3 and check for a correlation between the predicted values of the two measures.

##    Estimate  Est.Error      Q2.5     Q97.5
## R2 0.184591 0.07666008 0.0408367 0.3321432

We can see that although the correlation is relatively weak, our confidence interval indicates that we can be reasonably positive that \(R^2>0\) (i.e, 95% credible interval is 0.026 - 0.302).

Scatter plot showing the correlation between our two measures. Each line is a sample from the posterior of a Bayesian linear regression.

Figure 4.2: Scatter plot showing the correlation between our two measures. Each line is a sample from the posterior of a Bayesian linear regression.